Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NO-JIRA: Stop exposing the entire host filesystem #1085

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jmencak
Copy link
Contributor

@jmencak jmencak commented Jun 11, 2024

With the principle of least privilege in mind, stop exposing the
entirety of the host filesystem. Previously, we were mounting
the entire host filesystem to "/host". With this change, only
the necessary host directories for using TuneD are mounted and
a host's /var/lib directory for the operand's persistent
/var/lib/ocp-tuned directory.

Other changes

  • For e2e tests, use MCD to write to host's /etc/sysctl.d/ since
    we are no longer exposing the entire host filesystem for the NTO
    operand.

@openshift-ci openshift-ci bot requested review from dagrayvid and rbaturov June 11, 2024 11:02
Copy link
Contributor

openshift-ci bot commented Jun 11, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmencak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 11, 2024
Copy link
Contributor

@ffromani ffromani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea, questions about details of the implementation

assets/tuned/manifests/ds-tuned.yaml Show resolved Hide resolved
test/e2e/basic/metrics_cert_rotation.go Show resolved Hide resolved
}
if pod != nil {
util.ExecAndLogCommand("oc", "exec", "-n", ntoconfig.WatchNamespace(), pod.Name, "--", "rm", sysctlFile)
util.ExecAndLogCommand("oc", "debug", fmt.Sprintf("no/%s", node.Name), "--", "rm", sysctlFile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and following: could you please elaborate why oc debug exposes less than oc exec? It seems equivalente (or perhaps worse) than what we have currently

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking! True, however, this is code in e2e tests. The main idea is not to expose the entire /host in Tuned pods. Happy to share internal links to customer questions why we are exposing the entire /host over Slack if you're interested.

@jmencak
Copy link
Contributor Author

jmencak commented Jun 11, 2024

  STEP: writing /host/etc/sysctl.d/zzz.conf override file on the host with net.ipv4.neigh.default.gc_thresh1=256 @ 06/11/24 11:55:45.647
  run command 'oc [debug no/ip-10-0-23-102.us-west-2.compute.internal -- sh -c echo net.ipv4.neigh.default.gc_thresh1=256 > /host/etc/sysctl.d/zzz.conf; sync /host/etc/sysctl.d/zzz.conf]':
    out=
    err=error: unable to get namespace namespaces "ci-op-371fk0cx" not found
    ret=exit status 1

Hmm, looks like we might not be able to use oc debug in the CI? Let's see if we can use MCO for this.

Also,
/cc @liqcui
for awareness. Is this PR going to break any QE tests that you might have?

@openshift-ci openshift-ci bot requested a review from liqcui June 11, 2024 13:42
@liqcui
Copy link

liqcui commented Jun 11, 2024

  STEP: writing /host/etc/sysctl.d/zzz.conf override file on the host with net.ipv4.neigh.default.gc_thresh1=256 @ 06/11/24 11:55:45.647
  run command 'oc [debug no/ip-10-0-23-102.us-west-2.compute.internal -- sh -c echo net.ipv4.neigh.default.gc_thresh1=256 > /host/etc/sysctl.d/zzz.conf; sync /host/etc/sysctl.d/zzz.conf]':
    out=
    err=error: unable to get namespace namespaces "ci-op-371fk0cx" not found
    ret=exit status 1

Hmm, looks like we might not be able to use oc debug in the CI? Let's see if we can use MCO for this.

Also, /cc @liqcui for awareness. Is this PR going to break any QE tests that you might have?

Hi Jiri, Thank you for prompting me, We use openshift-node-tuning-operator namespace instead of ci-xxx ns when execute debug command. you can try this

@jmencak
Copy link
Contributor Author

jmencak commented Jun 11, 2024

Hi Jiri, Thank you for prompting me, We use openshift-node-tuning-operator namespace instead of ci-xxx ns when execute debug command. you can try this

Aah, good point, thank you. NM, I've just switched to using MCO/MCD. I don't have a problem with either approach though. The ping was meant as a heads up so that I don't break any QE's tests that might use /host in Tuned pods.

@jmencak jmencak changed the title Stop exposing the entire host filesystem NO-JIRA: Stop exposing the entire host filesystem Jul 1, 2024
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jul 1, 2024
@openshift-ci-robot
Copy link
Contributor

@jmencak: This pull request explicitly references no jira issue.

In response to this:

With the principle of least privilege in mind, stop exposing the entirety of the host filesystem.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@MarSik
Copy link
Contributor

MarSik commented Jul 10, 2024

@jmencak I think you should be more explicit in the cover message and in the commit about the internal changes to test infrastructure. Apart from that it looks good to me.

With the principle of least privilege in mind, stop exposing the
entirety of the host filesystem.  Previously, we were mounting
the entire host filesystem to "/host".  With this change, only
the necessary host directories for using TuneD are mounted and
a host's /var/lib directory for the operand's persistent
/var/lib/ocp-tuned directory.

Other changes
  * For e2e tests, use MCD to write to host's /etc/sysctl.d/ since
    we are no longer exposing the entire host filesystem for the NTO
    operand.
@jmencak jmencak force-pushed the 4.17-tuned-no-host branch from 18ad80f to 6a1398f Compare July 10, 2024 07:27
@jmencak
Copy link
Contributor Author

jmencak commented Jul 10, 2024

Since this affects #1019 GetMachineConfigDaemonForNode(), adding
/hold
Happy to rebase once @ffromani merges #1019

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 10, 2024
@Tal-or
Copy link
Contributor

Tal-or commented Jul 10, 2024

@jmencak are these kind of tests running on hypershift as well?
If yes, hypershift doesn't have the MCD concept

@jmencak
Copy link
Contributor Author

jmencak commented Jul 10, 2024

@jmencak are these kind of tests running on hypershift as well? If yes, hypershift doesn't have the MCD concept

Thank you for looking. No, I'm not aware of any lanes running these tests on HyperShift hosted clusters. In any case, this would not work prior this PR anyway. The "reboots" e2e tests rely on MCO.

@jmencak
Copy link
Contributor Author

jmencak commented Jul 10, 2024

/retest

Copy link
Contributor

openshift-ci bot commented Jul 10, 2024

@jmencak: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 9, 2024
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 8, 2024
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 8, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@jmencak
Copy link
Contributor Author

jmencak commented Nov 8, 2024

/remove-lifecycle rotten

@openshift-ci openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants